0%

Introduction to Time Series Analysis - 04

Introduction to Time Series Analysis - 04

This note is for course MATH 545 at McGill University.

Lecture 10 - Lecture 12


Estimation of mean μ\mu and auto-covariance function ρ(h)\rho(h) for stationary {Xt}\{X_t\} when E[Xt]=μE[X_t]=\mu

μ^=Xˉn=X1++Xnn\hat{\mu}=\bar{X}_{n}=\frac{X_{1}+\cdots+X_{n}}{n}

E[Xˉn]=E[X1]++E[Xn]n=μE\left[\bar{X}_{n}\right]=\frac{E\left[X_{1}\right]+\cdots+E\left[X_{n}\right]}{n}=\mu

E[(Xˉnμ)2]=MSE(Xˉn)=Var(Xˉn)E\left[\left(\bar{X}_{n}-\mu\right)^{2}\right]=\operatorname{MSE}\left(\bar{X}_{n}\right)=\operatorname{Var}\left(\bar{X}_{n}\right)

Var(Xˉn)=Var(1n[X1++Xn])=1n2Var(X1++Xn)=1n2i=1nj=1nCov(Xi,Xj)=1h2i=1nj=1nγ(ij)=1n2h=nn(nh)γ(h)=1nh=nn(1hn)γ(h)\begin{aligned} \operatorname{Var}\left(\bar{X}_{n}\right) &=\operatorname{Var}\left(\frac{1}{n}\left[X_{1}+\cdots +X_{n}\right]\right) \\ &=\frac{1}{n^{2}} \text{Var} \left(X_{1}+\cdots+X_{n}\right) \\ &=\frac{1}{n^{2}} \sum_{i=1}^{n} \sum_{j=1}^{n}\text{Cov}\left(X_{i}, X_{j}\right) \\ &=\frac{1}{h^{2}} \sum_{i=1}^{n} \sum_{j=1}^{n} \gamma(i-j) \\ &=\frac{1}{n^{2}} \sum_{h=-n}^{n}(n-|h|) \gamma(h) \\ &=\frac{1}{n} \sum_{h=-n}^{n}\left(1-\frac{|h|}{n}\right) \gamma(h) \end{aligned}

(Note that here (nh)(n-|h|) is the number of possible observed lags. And we have 0ijn0\leq |i-j| \leq n which means n+1n+1 values of ij|i-j|)

If γ(h)0\gamma(h) \rightarrow 0 as nn \rightarrow \infty, then Var(Xˉn)=MSE(Xˉn)\operatorname{Var}\left(\bar{X}_{n}\right)=\operatorname{MSE}\left(\bar{X}_{n}\right) \rightarrow as nn \rightarrow \infty

If h=γ(h)<\sum_{h=-\infty}^{\infty}|\gamma(h)|<\infty, then limnnVar(Xˉn)=n=γ(h)\lim _{n \rightarrow \infty} n \operatorname{Var}\left(\bar{X}_{n}\right)=\sum_{n=-\infty}^{\infty}|\gamma(h)|

If {Xt}\{X_t\} are also Gaussian, then XˉnN(μ,1nh<(1hn)γ(h))\bar{X}_{n} \sim N\left(\mu, \frac{1}{n} \sum_{|h|<\infty}\left(1-\frac{|h|}{n}\right) \gamma(h)\right)

To do testing and identical estimation Xˉn±ZQ/2Γn\bar{X}_{n} \pm Z_{Q / 2} \frac{\sqrt{\Gamma}}{\sqrt{n}}, where Γ=h<γ(h)\Gamma = \sum_{|h| < \infty} \gamma(h)

Plugin Γ^=hγ^(h)\hat{\Gamma}=\sum_{|h|}\hat{\gamma}(h)

Example: AR(1) process

Let {Xt}\{X_t\} be defined by Xtμ=ϕ(Xt1μ)+ZtX_t - \mu = \phi(X_{t-1}-\mu) + Z_t where ϕ<1|\phi| < 1 and {Zt}WN(0,σ2)\{Z_t\}\sim WN(0, \sigma^2)

So, Γ=h<γ(h)=h<σ2ϕh1ϕ2=(1+2h=1ϕh)σ21ϕ2=(1+2ϕ1ϕ)σ21ϕ2=1ϕ+2ϕ1ϕσ2(1+ϕ)(1ϕ)=σ2(1ϕ)2\begin{aligned} \Gamma &=\sum_{|h| < \infty} \gamma(h) = \sum_{|h| < \infty} \frac{\sigma^2 \phi^{|h|}}{1-\phi^2} \\ &=(1+2\sum^{\infty}_{h=1} \phi^{|h|})\frac{\sigma^2}{1-\phi^2} \\ &=(1+2\frac{\phi}{1-\phi})\frac{\sigma^2}{1-\phi^2} \\ &=\frac{1-\phi+2\phi}{1-\phi}\frac{\sigma^2}{(1+\phi)(1-\phi)} \\ &=\frac{\sigma^2}{(1-\phi)^2} \end{aligned}

So 95% Confidence Interval for μ\mu:

Xˉn±1.96nσ(1ϕ)2=Xˉn±1.96nσ1ϕ\bar{X}_n \pm \frac{1.96}{\sqrt{n}}\frac{\sigma}{\sqrt{(1-\phi)^2}} = \bar{X}_n \pm \frac{1.96}{\sqrt{n}}\frac{\sigma}{|1-\phi|}

γ^(h)=1nt=1nh(Xt+hXˉn)(XtXˉn)\hat{\gamma}(h)=\frac{1}{n}\sum_{t=1}^{n-|h|}(X_{t+|h|}-\bar{X}_n)(X_t-\bar{X}_n)

ρ^(h)=γ^(h)γ^(0)\hat{\rho}(h)=\frac{\hat{\gamma}(h)}{\hat{\gamma}(0)}

Both biased in finite samples, but consistent

Γ^k=[γ^(0)γ^(1)γ^(k1)γ^(k2)γ^(k1)γ^(0)]\hat{\Gamma}_k = \left [ \begin{array}{cccc} \begin{array}{l} \hat{\gamma}(0)\\ \hat{\gamma}(1) \end{array} & \cdots & \begin{array}{l}\hat{\gamma}(k-1) \\ \hat{\gamma}(k-2) \end{array} \\ \vdots & \ddots & \vdots\\ \begin{array}{l} \hat{\gamma}(k-1) \end{array} & \cdots & \begin{array}{l} \hat{\gamma}(0) \\ \end{array} \end{array} \right ]

For all k<nk<n, Γ^k\hat{\Gamma}_k will be non-negative definite. Not obvious how to estimate for knk\geq n, even for kk close to nn, Γ^\hat{\Gamma} will be unstable.

(Jenkin’s rule & theorems: need n=50n=50 and hn4h\leq \frac{n}{4})

In large samples without large lags, can approximate the distribution of (ρ^(1),,ρ^(k))(\hat{\rho}(1), \cdots, \hat{\rho}(k)) by:

ρ^MVN(ρ,1nW)\hat{\rho} \sim MVN(\rho, \frac{1}{n}W)

where WW is a k×kk\times k covariance matrix with elements computed by a simplification of butlet’s formula:

wij=k=1{ρ(k+i)+ρ(ki)2ρ(k)ρ(i)}×{ρ(k+j)+ρ(kj)2ρ(k)ρ(j)}w_{ij}=\sum^\infty_{k=1}\{\rho(k+i)+\rho(k-i)-2\rho(k)\rho(i)\}\times\{\rho(k+j)+\rho(k-j)-2\rho(k)\rho(j)\}

Example

{Xt}\{X_t\} iid \Rightarrow ρ(0)=1\rho(0)=1 and ρ(h)=0h>1\rho(h)=0 \quad |h|>1

So wij=k=1ρ(ki)ρ(kj)w_{ij}=\sum^\infty_{k=1}\rho(k-i)\rho(k-j)

We have wii=1w_{ii}=1 and wij=0w_{ij}=0, so ρ^(h)N(0,1n)\hat{\rho}(h) \sim N(0, \frac{1}{n}) as W=IW=I

We have an MA(1) process with Xt=Zt+θZt1X_t=Z_t+\theta Z_{t-1} where ZtWN(0,σ2)Z_t \sim WN(0, \sigma^2)

Then Xt=k=0ψkZtkX_t=\sum^\infty_{k=0}\psi_k Z_{t-k} and

ψk={1,for k=0θ,for k=10,for k{0,1}\Rightarrow \psi_k = \begin{cases} 1, \text{for } k=0 \\ \theta, \text{for } k=1 \\ 0, \text{for } k\neq\{0, 1\} \end{cases}

So γX(h)=j=ψjψjhσ2={(1+θ2)σ2,for h=0θσ2,for h=10,for h2\gamma_X(h)=\sum^\infty_{j=-\infty} \psi_j \psi_{j-h}\sigma^2 = \begin{cases}(1+\theta^2)\sigma^2, \text{for } h=0 \\ \theta\sigma^2, \text{for } h=1 \\ 0, \text{for } h\geq 2 \end{cases}

So ρ(0)=1\rho(0) = 1 and ρ(±1)=γ(1)γ(0)=θ1+θ2\rho(\pm 1) = \frac{\gamma(1)}{\gamma(0)} = \frac{\theta}{1+\theta^2}

From the formula wij=k=1{ρ(k+i)+ρ(ki)2ρ(k)ρ(i)}×{ρ(k+j)+ρ(kj)2ρ(k)ρ(j)}w_{ij}=\sum^\infty_{k=1}\{\rho(k+i)+\rho(k-i)-2\rho(k)\rho(i)\}\times\{\rho(k+j)+\rho(k-j)-2\rho(k)\rho(j)\}, we have for i=ji=j

wii=k=1{ρ(k+i)+ρ(ki)2ρ(k)ρ(i)}2w_{ii}=\sum^\infty_{k=1}\{\rho(k+i)+\rho(k-i)-2\rho(k)\rho(i)\}^2

w11=(ρ(o)2ρ(1))2+ρ(1)2=13ρ(1)2+4ρ(1)4w_{11}=(\rho(o)-2\rho(1))^2 + \rho(1)^2 = 1-3\rho(1)^2+4\rho(1)^4

If it is the case of MA(1), ρ^(1)N(θ1+θ2,1n(13ρ(1)2+4ρ(1)4))\hat{\rho}(1) \sim N(\frac{\theta}{1+\theta^2}, \frac{1}{n}(1-3\rho(1)^2+4\rho(1)^4))

For i>1i>1, wii=k=1{ρ(k+i)+ρ(ki)2ρ(k)ρ(i)}2=ρ(0)2+ρ(1)2+ρ(1)2=1+2ρ(1)2w_{ii}=\sum^\infty_{k=1}\{\rho(k+i)+\rho(k-i)-2\rho(k)\rho(i)\}^2 \\=\rho(0)^2 + \rho(1)^2 + \rho(-1)^2 = 1+2\rho(1)^2

(Note: because in this case, ρ(k+i)\rho(k+i) is always 0, ρ(k)ρ(i)\rho(k)\rho(i) is always 0, ρ(ki)\rho(k-i) is nonzero only when k=i,i+1,i1k = i, i+1, i-1)

Prediction (Forecasting)

Goal: Find linear construction of X1,,XnX_1, \cdots, X_n that forecasts Xn+hX_{n+h} with minimum MSE

Assume best linear predictor of Xn+hX_{n+h} is:

PnXn+h=a0+a1Xn+a2Xn1++anX1P_nX_{n+h} = a_0+a_1X_n+a_2X_{n-1}+\cdots+a_nX_1

We want to find a0,a1,,ana_0, a_1, \cdots, a_n to minimize:

S(a0,a1,,an)=E[(Xn+h(a0+a1Xn+a2Xn1++anX1))2]S(a_0, a_1, \cdots, a_n) = E[(X_{n+h} - (a_0+a_1X_n+a_2X_{n-1}+\cdots+a_nX_1))^2]

SS is quadratic, bounded below by zero, and there exist at least one solution to ajS(a0,,an)=0\frac{\partial}{\partial a_j}S(a_0, \cdots, a_n)=0 for j=0,1,,nj = 0, 1, \cdots, n

By taking derivatives, it gives (assuming interchange aj\frac{\partial}{\partial a_j} and E[]E[\cdot] safely)

[1]\begin{equation}\frac{\partial}{\partial a_0}S(a_0, \cdots, a_n)=0 \Rightarrow E[X_{n+h} - a_0 - \sum^n_{i=1}a_iX_{n+1-i}]=0 \end{equation}

[2]\begin{equation}\frac{\partial}{\partial a_j}S(a_0, \cdots, a_n)=0 \Rightarrow E[(X_{n+h} - a_0 - \sum^n_{i=1}a_iX_{n+1-i})X_{n+1-j}]=0 \end{equation}

From equation 1 we have:

a0=E(Xn+h)i=1naiE(Xn+1i)=μμi=1naia_0 = E(X_{n+h}) - \sum_{i=1}^n a_i E(X_{n+1-i}) \\=\mu-\mu \sum^n_{i=1} a_i if {Xt}\{X_t\} is stationary

Plug it into equation 2, we have:

E[(Xn+hμ(1i=1nai)i=1naiXn+1i)Xn+1j]=0E[(X_{n+h} - \mu(1-\sum^n_{i=1} a_i) - \sum^n_{i=1}a_iX_{n+1-i})X_{n+1-j}] = 0

E[(Xn+hμ)Xn+1j]=E[i=1nai(Xn+1iμ)Xn+1j]E[(X_{n+h}-\mu)X_{n+1-j}]=E[\sum^n_{i=1}a_i(X_{n+1-i}-\mu)X_{n+1-j}]

γ(h+j1)=i=1naiE[(Xn+hμ)Xn+1j]=i=1naiγ(ij)\gamma(h+j-1) = \sum^n_{i=1}a_i E[(X_{n+h}-\mu)X_{n+1-j}] = \sum^n_{i=1}a_i \gamma(i-j)

We have a matrix form of this formula Γnan=γn(h)\Gamma_n\underset{\sim}{a_n} = \underset{\sim}{\gamma_n}(h) where (Γn)ij=γ(ij)(\Gamma_n)_{ij} = \gamma(i-j)

Therefore PnXn+h=μ+i=1nai(Xn+1iμ)P_nX_{n+h} = \mu + \sum^n_{i=1}a_i(X_{n+1-i}-\mu) where an\underset{\sim}{a_n} satisfies Γnan=γn(h)\Gamma_n\underset{\sim}{a_n} = \underset{\sim}{\gamma_n}(h).

Obviously, E[Xn+h(μ+i=1nai(Xn+1iμ))]=0E[X_{n+h} - (\mu + \sum^n_{i=1}a_i(X_{n+1-i}-\mu))] = 0

\begin{align}MSE &=E((X_{n+h} - P_nX_{n+h})^2) \\&=E(X_{n+h} - (\mu + \sum^n_{i=1}a_i(X_{n+1-i}-\mu))^2) \\&=E[((X_{n+h} - \mu) -( \sum^n_{i=1}a_i(X_{n+1-i}-\mu)))^2] \\&=E((X_{n+h} - \mu)^2) -2E((X_{n+h} - \mu)(\sum^n_{i=1}a_i(X_{n+1-i}-\mu))) + E((\sum^n_{i=1}a_i(X_{n+1-i}-\mu))^2) \\&=\gamma(0) - 2\sum^n_{i=1} E((X_{n+h}-\mu)(X_{n+1-i}-\mu))+\sum^n_{i=1}\sum^n_{j=1}a_iE((X_{n+1-i}-\mu)(X_{n+1-j}-\mu))a_j \\&=\gamma(0) - w\sum^n_{i=1}a_i\gamma(h+i-1) + \sum^n_{i=1}\sum^n_{j=1}a_i\gamma(i-j)a_j \\&=\gamma(0) - w\sum^n_{i=1}a_i\gamma(h+i-1) + \sum^n_{i=1}a_i(\sum^n_{j=1}\gamma(i-j)a_j) \\&=\gamma(0) - 2\underset{\sim}{a_n}^T\underset{\sim}{\gamma_n}(h) + \underset{\sim}{a_n}^T\Gamma_n\underset{\sim}{a_n} \\&=\gamma(0) - \underset{\sim}{a_n}^T\underset{\sim}{\gamma_n}(h) \end{align}

(Note: because an\underset{\sim}{a_n} satisfies Γnan=γn(h)\Gamma_n\underset{\sim}{a_n} = \underset{\sim}{\gamma_n}(h), so anTγn(h)=anTΓnan\underset{\sim}{a_n}^T\underset{\sim}{\gamma_n}(h) = \underset{\sim}{a_n}^T\Gamma_n\underset{\sim}{a_n} )

Example AR(1)

Xt=ϕXt1+ZtX_t = \phi X_{t-1} + Z_t, where ϕ<1|\phi|<1 and {Zt}WN(0,σ2)\{Z_t\} \sim WN(0, \sigma^2)

We try a1=ϕ,ak=0a_1=\phi, a_k = 0 for k=2,,nk = 2, \cdots, n

A solution that works is an=(ϕ,0,,0)\underset{\sim}{a_n} = (\phi, 0, \cdots, 0) and PnXn+1=anTXn=ϕXnP_nX_{n+1} = \underset{\sim}{a_n}^T\underset{\sim}{X_n} = \phi X_n

E((Xn+1PnXn+1)2)=γ(0)anTγ(1)=σ21ϕ2ϕγ(1)=σ21ϕ2ϕσ2ϕ1ϕ2=σ2E((X_{n+1} - P_nX_{n+1})^2) = \gamma(0) - \underset{\sim}{a_n}^T \underset{\sim}{\gamma}(1) \\=\frac{\sigma^2}{1-\phi^2} - \phi\gamma(1) =\frac{\sigma^2}{1-\phi^2} - \phi\frac{\sigma^2\phi}{1-\phi^2} \\= \sigma^2

Now let YY and W1,,WnW_1, \cdots, W_n be any random variabe with finite second moments and means μ=E(Y)\mu = E(Y) and μi=E(Wi)\mu_i = E(W_i), and covariance Cov(Y,Y),Cov(Y,Wi),Cov(Wi,Wj)Cov(Y, Y), Cov(Y, W_i), Cov(W_i, W-j)

Let W=(Wn,,W1)\underset{\sim}{W} = (W_n, \cdots, W_1), μ=(μn,,μ1)\underset{\sim}{\mu} = (\mu_n, \cdots, \mu_1), and γ=Cov(Y,W)\underset{\sim}{\gamma} = Cov(Y, \underset{\sim}{W}), Γ=Cov(W,W)\Gamma = Cov(\underset{\sim}{W}, \underset{\sim}{W}) such that Γij=Cov(Wn+1i,Wn+1j)\Gamma_{ij} = Cov(W_{n+1-i}, W_{n+1-j})

By exactly the same methods from before, we show that the best linear predictor of YY given WW is:

P(YW)=μY+aT(WμW)P(Y|W) = \mu_Y + \underset{\sim}{a}^T(\underset{\sim}{W} - \underset{\sim}{\mu_W})

where a=(a1,,an)\underset{\sim}{a} = (a_1, \cdots, a_n) is any solution of Γa=γ\Gamma\underset{\sim}{a} = \underset{\sim}{\gamma}

Return to AR(1), assume that we observe X1X_1 and X3X_3 but not X2X_2

Let Y=X2Y=X_2 and W=(X1,X3)W=(X_1, X_3), we have

Γ=(σ21ϕ2ϕ2σ21ϕ2ϕ2σ21ϕ2σ21ϕ2)\Gamma = \begin{pmatrix} \frac{\sigma^2}{1-\phi^2} & \frac{\phi^2\sigma^2}{1-\phi^2}\\ \frac{\phi^2\sigma^2}{1-\phi^2} & \frac{\sigma^2}{1-\phi^2} \end{pmatrix}

γ=(Cov(X2,X1)Cov(X2,X3))=(σ21ϕ2ϕσ21ϕ2ϕ)\gamma = \begin{pmatrix} Cov(X_2, X_1) \\ Cov(X_2, X_3) \end{pmatrix} = \begin{pmatrix} \frac{\sigma^2}{1-\phi^2}\phi \\ \frac{\sigma^2}{1-\phi^2}\phi \end{pmatrix}

As Γa=γ(1ϕ2ϕ21)a=(ϕϕ)\Gamma\underset{\sim}{a} = \underset{\sim}{\gamma} \Rightarrow \begin{pmatrix} 1 & \phi^2\\ \phi^2 & 1 \end{pmatrix}\underset{\sim}{a} = \begin{pmatrix} \phi\\ \phi \end{pmatrix}

So a=(ϕ1+ϕ2ϕ1+ϕ2)\underset{\sim}{a} = \begin{pmatrix} \frac{\phi}{1+\phi^2}\\ \frac{\phi}{1+\phi^2} \end{pmatrix}

P(X2X1,X3)=ϕ1+ϕ2X1+ϕ1+ϕ2X3=ϕ1+ϕ2(X1+X3)+ϕ1+ϕ20P(X_2|X_1, X_3) = \frac{\phi}{1+\phi^2}X_1 + \frac{\phi}{1+\phi^2}X_3 = \frac{\phi}{1+\phi^2}(X_1+X_3) + \frac{\phi}{1+\phi^2}\cdot 0

P(W)P(\cdot|W) is a prediction operator and it has useful properties:

Let E(U2)<,E(V2)<,Γ=Cov(W,W)E(U^2) < \infty, E(V^2) < \infty, \Gamma=Cov(\underset{\sim}{W}, \underset{\sim}{W})

Let β,a1,,an\beta, a_1, \cdots, a_n be constants

  1. P(UW)=E(U)+aT(WE(W))P(U|W) = E(U) + a^T(W - E(\underset{\sim}{W})) where \Gamma_\underset{\sim}{a} = Cov(U, \underset{\sim}{W})
  2. E((UP(UW))W)=0E((U-P(U|\underset{\sim}{W}))\underset{\sim}{W}) = \underset{\sim}{0} and E(UP(UW))=0E(U-P(U|\underset{\sim}{W}))=0
  3. E((UP(UW))2)=Var(U)aTCov(U,W)E((U-P(U|\underset{\sim}{W}))^2) = Var(U) - \underset{\sim}{a}^TCov(U, \underset{\sim}{W})
  4. P(aiU+a2V+βW)=a1P(UW)+a2P(VW)+βP(a_iU+a_2V+\beta|W) = a_1P(U|W) + a_2P(V|W)+\beta
  5. P(i=1naiwi+βW)=i=1naiwi+βP(\sum_{i=1}^na_iw_i + \beta|\underset{\sim}{W}) = \sum_{i=1}^na_iw_i+\beta
  6. P(UW)=E(U)P(U|\underset{\sim}{W}) = E(U) if Cov(U,W)=0Cov(U,\underset{\sim}{W})=0
  7. P(UW)=P(P(UW,V)W)P(U|\underset{\sim}{W})=P(P(U|\underset{\sim}{W},\underset{\sim}{V})|\underset{\sim}{W})

(Note: for property 7, P(UW,V)=μU+aW(WμW)+aV(VμV)P(U|\underset{\sim}{W}, \underset{\sim}{V})=\mu_U+a_W(W-\mu_W)+a_V(V-\mu_V), P(UW)=P(μU+aW(WμW)+aV(VμV)W)P(U|\underset{\sim}{W}) = P(\mu_U + a_W(W-\mu_W)+a_V(V-\mu_V)|W))

Assume {Xt}\{X_t\} is a stationary process with mean 0 and auto-covariance function γ()\gamma(\cdot), we can solve for a\underset{\sim}{a} to determine PnXn+hP_nX_{n+h} in terms of {Xn,,X1}\{X_n, \cdots, X_1\}.

However, for large nn, inventory Γ\Gamma is not fun!

Perhaps we can use linearity of PnP_n to do recursive prediction of Pn+1Xn+hP_{n+1}X_{n+h} from PnXn+1P_nX_{n+1}.

If Γn\Gamma_n id non-singular, then PnXn+1=ϕnTX=ϕ1Xn++ϕnX1P_nX_{n+1} = \underset{\sim}{\phi^T_n}\underset{\sim}{X} = \phi_1X_n + \cdots + \phi_nX_1